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1. INTRODUCTION 

Traditional statistical inference considers relatively 
small data sets and the corresponding theoretical 
analysis focuses on the asymptotic behavior of a sta- 
tistical estimator when the number of samples ap- 
proaches infinity. However, many data sets encoun- 
tered in modern applications have dimensionality 
significantly larger than the number of training data 
available, and for such problems the classical statis- 
tical tools become inadequate. In order to analyze 
high-dimensional data, new statistical methodology 
and the corresponding theory have to be developed. 

In the past decade, sparse modeling and the corre- 
sponding use of sparse regularization methods have 
emerged as a major technique to handle high-dimen- 
sional data. While the data dimensionality is high, 
the basic assumption in this approach is that the ac- 
tual estimator is sparse in the sense that only a small 
number of components are nonzero. On the practical 
side, the sparsity phenomenon has been ubiquitously 
observed in applications, including signal recovery, 
genomics, computer vision, etc. On the theoretical 
side, this assumption makes it possible to overcome 
the problem associated with estimating more pa- 
rameters than the number of observations which is 
impossible to deal with in the classical setting. 

There are a number of challenges, including devel- 
oping new theories for high-dimensional statistical 
estimation as well as new formulations and compu- 
tational procedures. Related problems have received 
a lot of attention in various research fields, including 
applied math, signal processing, machine learning, 
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statistics and optimization. Rapid advances have been 
made in recent years. In view of the growing research 
activities and their practical importance, we have or- 
ganized this special issue of Statistical Science with 
the goal of providing overviews of several topics in 
modern sparsity analysis and associated regulariza- 
tion methods. Our hope is that general readers will 
get a broad idea of the field as well as current re- 
search directions. 

2. SPARSE MODELING AND 
REGULARIZATION 

One of the central problem in statistics is linear re- 
gression, where we consider annxp design matrix X 
and an n-dimensional response vector Y € M. n so that 

(1) Y = X[3 + e, 

where f3 E W is the true regression coefficient vector 
and e G 1" is a noise vector. In the case of n < p, this 
problem is ill-posed because the number of parame- 
ters is more than the number of observations. This 
ill-posedness can be resolved by imposing a spar- 
sity constraint: that is, by assuming that ||/3||o < s 
for some s, where the io-noTm of j3 is defined as 
||/3||o = |supp(/3)|, and the support set of /? is defined 
as supp(/3) := {j : ftj ^ 0}. If s <C n, then the effec- 
tive number of parameters in (1) is smaller than the 
number of observations. 

The sparsity assumption may be viewed as the 
classical model selection problem, where models are 
indexed by the set of nonzero coefficients. The clas- 
sical model selection criteria such as AIC, BIC or 
Cp [1, 7, 11] naturally lead to the so-called l§ regu- 
larization estimator: 
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The main difference of modern £q analysis in high- 
dimensional statistics and the classical model selec- 
tion methods is that the choice of A will be differ- 
ent, and the modern analysis requires choosing a 
larger A than that considered in the classical model 
selection setting because it is necessary to compen- 
sate for the effect of considering many models in the 
high-dimensional setting. The analysis for £q regu- 
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larization in the high-dimensional setting (e.g., [15] 
in this issue) employs different techniques and the 
results obtained are also different from the classical 
literature. 

The £q regularization formulation leads to a non- 
convex optimization problem that is difficult to solve 
computationally. On the other hand, an important 
requirement for modern high-dimensional problems 
is to design computationally efficient and statisti- 
cally effective algorithms. Therefore, the main fo- 
cus of the existing literature is on convex relaxation 
methods that use £i-regularization (Lasso) to re- 
place sparsity constraints: 



(3) 
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Wxp-YWl + xwpih 



This method is referred to as Lasso [12] in the liter- 
ature and its theoretical properties have been inten- 
sively studied. Since the formulation is regarded as 
an approximation of (2), a key question is how good 
this approximation is, and how good is the estimator 
fiih) f or estimating ft. 

Many extensions of Lasso have appeared in the 
literature for more complex problems. One exam- 
ple is group Lasso [14] that assumes that variables 
are selected in groups. Another extension is the es- 
timation of graphical models, where one can employ 
Lasso to estimate unknown graphical model struc- 
tures [3, 8]. A third example is matrix regularization, 
where the concept of sparsity can be replaced by the 
concept of low-rankness, and sparsity constraints be- 
come low-rank constraints. Of special interest is the 
so-called matrix completion problem, where we want 
to recover a matrix from a few observations of the 
matrix entries. This problem is encountered in rec- 
ommender system applications (e.g., a person buys 
a book at amazon.com will be recommended other 
books purchased by other users with similar inter- 
ests), and low-rank matrix factorization is one of the 
main techniques for this problem. Similar to sparsity 
regularization, using low-rank regularization leads 
to nonconvex formulations and, thus, it is natural 
to consider its convex relaxation which is referred 
to as trace-norm (or Nuclear norm) regularization. 
The theoretical properties and numerical algorithms 
for trace-norm regularization methods have received 
attention. 

3. ARTICLES IN THIS ISSUE 

The eight articles in this issue present general 
overviews of the state of the art in a number of dif- 
ferent topics concerning sparsity analysis and regu- 



larization methods. Moreover, many articles go be- 
yond the current state of the art in various ways. 
Therefore, these articles not only give some high 
level ideas about the current topics, but will also 
be valuable for experts working in the field. 

• Bach, Jenatton, Mairal and Guillaume (Structured 
sparsity through convex optimization, [2]) study 
convex relaxations based on structured norms in- 
corporating further structural prior knowledge. An 
extension of the standard £o sparsity model that 
has received a lot of attention in recent years is 
structured sparsity. The basic idea is that not all 
sparsity patterns for supp(/5) are equally likely. 
A simple example is group sparsity where nonzero 
coefficients occur together in predefined groups. 
More complex structured sparsity models have 
been investigated in recent years. Although the 
paper by Bach et al. focuses on the convex op- 
timization approach, they also give an extensive 
survey of recent developments, including the use 
of sub- modular set functions. 

• van de Geer and Midler (Quasi-likelihood and/or 
robust estimation in high dimensions, [13]) ex- 
tend l\ regularization methods to generalized lin- 
ear models. This involves consideration of loss func- 
tions beyond the usual least-squares loss and, in 
particular, loss functions arising via quasi-likeli- 
hoods. 

• Huang, Breheny and Ma (A selective review of 
group selection in high-dimensional regression, [5]) 
provide a detailed review of the most important 
special case of structured sparsity, namely, group 
sparsity. Their review covers both convex relax- 
ation (or group Lasso) and approaches based on 
nonconvex group penalties. 

• Huet, Giraud and Verzelen (High-dimensional re- 
gression with unknown variance, [4]) address is- 
sues in high-dimensional regression estimation con- 
nected with lack of knowledge of the error vari- 
ance. In the standard Lasso formulation (3), the 
regularization parameter A is considered as a tun- 
ing parameter that needs to be chosen propor- 
tionally to the standard deviation a of the noise 
vector. A natural question is whether it is possi- 
ble to automatically estimate a instead of leaving 
A as a tuning parameter. This problem has re- 
ceived much attention and a number of develop- 
ments have been made in recent years. This paper 
reviews and compares several approaches to this 
problem. 

• Lafferty, Liu and Wasserman (Sparse nonpara- 
metric graphical models, [6]) discuss another im- 
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portant topic in sparsity analysis, the graphical 
model estimation problem. While much of the cur- 
rent work assumes that the data come from a mul- 
tivariate Gaussian distribution, this paper goes 
beyond the standard practice. The authors out- 
line a number of possible approaches and intro- 
duce more flexible models for the problem. The 
authors also describe some of their recent work, 
and describe future research directions. 

• Negahban, Ravikumar, Wainwright and Yu (A uni- 
fied framework for high-dimensional analysis of 
M-estimators with decomposable regularizers, [9]) 
provide a unified treatment of existing approaches 
to sparse regularization. The paper extends the 
standard sparse recovery analysis of t\ regularized 
least squares regression problems by introducing 
a general concept of restricted strong convexity. 
This allows the authors to study more general 
formulations with different convex loss functions 
and a class of "decomposable" regularization con- 
ditions. 

• Rigollet and Tsybakov (Sparse estimation by ex- 
ponential weighting, [10]) present a thorough anal- 
ysis of oracle inequalities in the context of model 
averaging procedures, a class of methods which 
has its original in the Bayesian literature. Model 
averaging is in general more stable than model 
selection. For example, in the scenario that two 
models are very similar and only one is correct, 
model selection forces us to choose one of the 
models even if we are not certain which model 
is true. On the other hand, a model averaging 
procedure does not force us to choose one of the 
two models, but only to take the average of the 
two models. This is beneficial when several of the 
models are similar and we cannot tell which is the 
correct one. The modern analysis of model averag- 
ing procedures leads to oracle inequalities that are 
sharper than the corresponding oracle inequalities 
for model selection methods such as Lasso. The 
authors give an extensive discussion of such or- 
acle inequalities using an exponentially weighted 
model averaging procedure. Such procedures have 
advantages over model selection when the under- 
lying models are correlated and when the model 
class is misspecified. 

• Zhang and Zhang (A general theory of concave 
regularization for high-dimensional sparse estima- 
tion problems, [15]) focus on nonconvex penalties 
and study a variety of issues related to such penal- 
ties. Although the natural formulation of a spar- 



sity constraint is £q regularization, due to its com- 
putational difficulty, most of the recent literature 
focuses on the simpler i\ regularization method 
(Lasso) that approximates l§ regularization. How- 
ever, it is also known that i\ regularization is not 
a very good approximation to £q regularization. 
This leads to the study of nonconvex penalties. 
The nonconvex formulations are both harder to 
analyze statistically and harder to handle com- 
putationally. Some fundamental understanding of 
high-dimensional nonconvex procedures has only 
started to emerge recently. Nevertheless, some ba- 
sic questions have remained unanswered: for ex- 
ample, properties of the global solution of non- 
convex formulations and whether it is possible to 
compute the global optimal solution efficiently un- 
der suitable conditions. The authors go a consid- 
erable distance toward providing a general the- 
ory that answers some of these fundamental ques- 
tions. 

REFERENCES 

[1] Akaike, H. (1973). Information theory and an exten- 
sion of the maximum likelihood principle. In Sec- 
ond International Symposium on Information The- 
ory (Tsahkadsor, 1971) 267-281. Akademiai Kiado, 
Budapest. MR0483125 

[2] Bach, F., Jenatton, R., Mairal, J. and Obozin- 
SKI, G. (2012). Structured sparsity through convex 
optimization. Statist. Sci. 27 450-468. 

[3] Banerjee, O., El Ghaoui, L. and d'Aspremont, A. 

(2008). Model selection through sparse maximum 
likelihood estimation for multivariate Gaussian or 
binary data. J. Mach. Learn. Res. 9 485-516. 
MR2417243 

[4] Giraud, C, Huet, S. and Verzelen, N. (2012). 

High-dimensional regression with unknown vari- 
ance. Statist. Set. 27 500-518. 

[5] Huang, J., Breheny, P. and Ma, S. (2012). A selec- 
tive review of group selection in high dimensional 
models. Statist. Sci. 27 481-499. 

[6] Lafferty, J., Liu, H. and Wasserman, L. (2012). 

Sparse nonparametric graphical models. Statist. 
Sci. 27 519-537. 

[7] Mallows, C. L. (1973). Some comments on Cp. Tech- 
nometrics 12 661-675. 

[8] Meinshausen, N. and Buhlmann, P. (2006). High- 
dimensional graphs and variable selection with the 
lasso. Ann. Statist. 34 1436-1462. MR2278363 

[9] Negahban, S., Ravikumar, P., Wainwright, M. J. 

and Yu, B. (2012). A unified framework for high- 
dimensional analysis of M-estimators with decom- 
posable regularizers. Statist. Sci. 27 538-557. 
[10] Rigollet, P. and Tsybakov, A. B. (2012). Sparse es- 
timation by exponential weighting. Statist. Sci. 27 
558-575. 



J. WELLNER AND T. ZHANG 



4 

[11] Schwarz, G. (1978). Estimating the dimension of a 
model. Ann. Statist. 6 461-464. MR0468014 

[12] Tibshirani, R. (1996). Regression shrinkage and selec- 
tion via the lasso. J. R. Stat. Soc. Ser. B Stat. 
Methodol. 58 267-288. MR1379242 

[13] van de Geer, S. and Muller, P. (2012). Quasi- 
likelihood and/or robust estimation in high dimen- 
sions. Statist. Sci. 27 469-480. 



[14] Yuan, M. and Lin, Y. (2006). Model selection and 
estimation in regression with grouped variables. 
J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49-67. 
MR2212574 

[15] Zhang, C.-H. and Zhang, T. (2012). A general the- 
ory of concave regularization for high dimensional 
sparse estimation problems. Statist. Sci. 27 576- 
593. 



